Problem Statement¶
Business Context¶
Workplace safety in hazardous environments like construction sites and industrial plants is crucial to prevent accidents and injuries. One of the most important safety measures is ensuring workers wear safety helmets, which protect against head injuries from falling objects and machinery. Non-compliance with helmet regulations increases the risk of serious injuries or fatalities, making effective monitoring essential, especially in large-scale operations where manual oversight is prone to errors and inefficiency.
To overcome these challenges, SafeGuard Corp plans to develop an automated image analysis system capable of detecting whether workers are wearing safety helmets. This system will improve safety enforcement, ensuring compliance and reducing the risk of head injuries. By automating helmet monitoring, SafeGuard aims to enhance efficiency, scalability, and accuracy, ultimately fostering a safer work environment while minimizing human error in safety oversight.
Objective¶
As a data scientist at SafeGuard Corp, you are tasked with developing an image classification model that classifies images into one of two categories:
- With Helmet: Workers wearing safety helmets.
- Without Helmet: Workers not wearing safety helmets.
Data Description¶
The dataset consists of 631 images, equally divided into two categories:
- With Helmet: 311 images showing workers wearing helmets.
- Without Helmet: 320 images showing workers not wearing helmets.
Dataset Characteristics:
- Variations in Conditions: Images include diverse environments such as construction sites, factories, and industrial settings, with variations in lighting, angles, and worker postures to simulate real-world conditions.
- Worker Activities: Workers are depicted in different actions such as standing, using tools, or moving, ensuring robust model learning for various scenarios.
Installing and Importing the Necessary Libraries¶
%pip install numpy pandas matplotlib seaborn scikit-learn opencv-python tensorflow keras pillow -q
Note: you may need to restart the kernel to use updated packages.
import tensorflow as tf
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print(tf.__version__)
Num GPUs Available: 0 2.20.0
Note:
After running the above cell, kindly restart the notebook kernel (for Jupyter Notebook) or runtime (for Google Colab) and run all cells sequentially from the next cell.
On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook.
import os
import random
import numpy as np # Importing numpy for Matrix Operations
import pandas as pd
import seaborn as sns
import matplotlib.image as mpimg # Importing pandas to read CSV files
import matplotlib.pyplot as plt # Importting matplotlib for Plotting and visualizing images
import math # Importing math module to perform mathematical operations
import cv2
# Tensorflow modules
import keras
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Importing the ImageDataGenerator for data augmentation
from tensorflow.keras.models import Sequential # Importing the sequential module to define a sequential model
from tensorflow.keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPooling2D,BatchNormalization # Defining all the layers to build our CNN Model
from tensorflow.keras.optimizers import Adam,SGD # Importing the optimizers which can be used in our model
from sklearn import preprocessing # Importing the preprocessing module to preprocess the data
from sklearn.model_selection import train_test_split # Importing train_test_split function to split the data into train and test
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from keras.applications.vgg16 import VGG16 # Importing confusion_matrix to plot the confusion matrix
# Display images using OpenCV
# from google.colab.patches import cv2_imshow
#Imports functions for evaluating the performance of machine learning models
from sklearn.metrics import confusion_matrix, f1_score,accuracy_score, recall_score, precision_score, classification_report
from sklearn.metrics import mean_squared_error as mse # Importing cv2_imshow from google.patches to display images
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Set the seed using keras.utils.set_random_seed. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
tf.keras.utils.set_random_seed(812)
Data Overview¶
Loading the data¶
images = np.load('images_proj.npy')
labels = pd.read_csv('labels_proj.csv')
print(f'Images shape: {images.shape}')
print(f'Labels shape: {labels.shape}')
print(labels.value_counts())
Images shape: (631, 200, 200, 3) Labels shape: (631, 1) Label 0 320 1 311 Name: count, dtype: int64
print("minimum value of the image array is ",np.min(images[0]))
print("maximum value of the image array is",np.max(images[0]))
minimum value of the image array is 0 maximum value of the image array is 255
Observations¶
- The no of images given to train and test split are less --> 631
- The image counts can be considered as balanced 320, 311
Exploratory Data Analysis¶
Plot random images from each of the classes and print their corresponding labels.¶
def plot_sample_images(images, labels, num_samples=4):
with_helmet_indices = labels[labels['Label'] == 1].index.tolist()
without_helmet_indices = labels[labels['Label'] == 0].index.tolist()
plt.figure(figsize=(12, 6))
for i in range(num_samples):
idx = random.choice(with_helmet_indices)
plt.subplot(2, num_samples, i + 1)
plt.imshow(images[idx])
plt.title('With Helmet')
plt.axis('off')
# Plot random images without helmet
for i in range(num_samples):
idx = random.choice(without_helmet_indices)
plt.subplot(2, num_samples, num_samples + i + 1)
plt.imshow(images[idx])
plt.title('Without Helmet')
plt.axis('off')
plt.tight_layout()
plt.show()
# Display random samples of images with and without helmets
plot_sample_images(images,labels)
Observation¶
- the data sample looks to be very simple, as without helmet images are mostly closeups to face
did run this sample multiple times and saw that almost all images without helmet are closeups.
Checking for class imbalance¶
# Count the number of images in each class
class_distribution = labels['Label'].value_counts()
# Create a bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x=class_distribution.index, y=class_distribution.values)
plt.title('Distribution of Classes')
plt.xlabel('Class (0: Without Helmet, 1: With Helmet)')
plt.ylabel('Number of Images')
# Add value labels on top of each bar
for i, v in enumerate(class_distribution.values):
plt.text(i, v, str(v), ha='center', va='bottom')
plt.show()
# Calculate the percentage distribution
percentage_distribution = (class_distribution / len(labels) * 100).round(2)
print("\nPercentage Distribution:")
for class_label, percentage in percentage_distribution.items():
print(f"Class {class_label}: {percentage}%")
Percentage Distribution: Class 0: 50.71% Class 1: 49.29%
Observations¶
- As observed earlier, we are seeing that the distribution is fairly even
- the data sample looks to be very simple, as without helmet images are mostly closeups to face
- The no of images given to train and test split are less --> 631
- The image counts can be considered as balanced 320, 311
- The data need to be normalized to use as values are between 0 and 255
Data Preprocessing¶
Converting images to grayscale¶
# Function to convert RGB images to grayscale
def convert_to_grayscale(images):
gray_images = []
for img in images:
# Convert RGB to grayscale using cv2
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# Add channel dimension for CNN input (H, W, 1)
gray_img = gray_img[..., np.newaxis]
gray_images.append(gray_img)
return np.array(gray_images)
# Convert images to grayscale
gray_images = convert_to_grayscale(images)
# Display sample images before and after conversion
plt.figure(figsize=(12, 6))
for i in range(3):
# Original RGB image
plt.subplot(2, 3, i + 1)
plt.imshow(images[i])
plt.title('Original RGB')
plt.axis('off')
# Grayscale image
plt.subplot(2, 3, i + 4)
plt.imshow(gray_images[i].squeeze(), cmap='gray')
plt.title('Grayscale')
plt.axis('off')
plt.tight_layout()
plt.show()
print("Original image shape:", images[0].shape)
print("Grayscale image shape:", gray_images[0].shape)
Original image shape: (200, 200, 3) Grayscale image shape: (200, 200, 1)
# Apply Gaussian blur to the grayscale images
def apply_gaussian_blur(images, kernel_size=(5,5)):
blurred_images = []
for img in images:
# Apply Gaussian blur
blurred = cv2.GaussianBlur(img, kernel_size, 0)
blurred_images.append(blurred)
return np.array(blurred_images)
# Apply blur to grayscale images
blurred_images = apply_gaussian_blur(gray_images)
# Apply Laplacian edge detection to the grayscale images
def apply_laplacian(images, ksize=3):
laplacian_images = []
for img in images:
# Apply Gaussian blur first to reduce noise
blurred = cv2.GaussianBlur(img, (5,5), 0)
# Apply Laplacian
laplacian = cv2.Laplacian(blurred, cv2.CV_64F, ksize=ksize)
# Convert back to uint8 and normalize to 0-255 range
laplacian = np.uint8(np.absolute(laplacian))
laplacian_images.append(laplacian)
return np.array(laplacian_images)
# Apply Laplacian edge detection on grayscale images
laplacian_images = apply_laplacian(gray_images)
# Display sample images to compare all preprocessing steps
plt.figure(figsize=(15, 8))
for i in range(3):
# Original RGB image
plt.subplot(4, 3, i + 1)
plt.imshow(images[i])
plt.title('Original RGB')
plt.axis('off')
# Grayscale image
plt.subplot(4, 3, i + 4)
plt.imshow(gray_images[i].squeeze(), cmap='gray')
plt.title('Grayscale')
plt.axis('off')
# Blurred image
plt.subplot(4, 3, i + 7)
plt.imshow(blurred_images[i].squeeze(), cmap='gray')
plt.title('Gaussian Blur')
plt.axis('off')
# Laplacian image
plt.subplot(4, 3, i + 10)
plt.imshow(laplacian_images[i].squeeze(), cmap='gray')
plt.title('Laplacian Edge Detection')
plt.axis('off')
plt.tight_layout()
plt.show()
print("Original image shape:", images[0].shape)
print("Grayscale image shape:", gray_images[0].shape)
print("Blurred image shape:", blurred_images[0].shape)
print("Laplacian image shape:", laplacian_images[0].shape)
Original image shape: (200, 200, 3) Grayscale image shape: (200, 200, 1) Blurred image shape: (200, 200) Laplacian image shape: (200, 200)
preparing images¶
- we can use gray scale or blurred images if we need to improve the performance fo the model.
- later stages of the project we can decide to use these images if needed.
- General observations based on above images, we can see edges and curves in various styles of the image filters
Splitting the dataset¶
# images, gray_images, blurred_images, laplacian_images
# splitting the data into training and testing sets , since there are only 631 samples, we are using 60% for training, 20% for validation and 20% for testing
# splitting rgb images
x_train_rgb, x_temp_rgb, y_train_rgb, y_temp_rgb = train_test_split(images, labels['Label'], test_size=0.4, random_state=42, stratify=labels['Label'])
x_val_rgb, x_test_rgb, y_val_rgb, y_test_rgb = train_test_split(x_temp_rgb, y_temp_rgb, test_size=0.5, random_state=42, stratify=y_temp_rgb)
# splitting grayscale images
x_train_gray, x_temp_gray, y_train_gray, y_temp_gray = train_test_split(gray_images, labels['Label'], test_size=0.4, random_state=42, stratify=labels['Label'])
x_val_gray, x_test_gray, y_val_gray, y_test_gray = train_test_split(x_temp_gray, y_temp_gray, test_size=0.5, random_state=42, stratify=y_temp_gray)
# splitting blurred images
#x_train_blur, x_temp_blur, y_train_blur, y_temp_blur = train_test_split(blurred_images, labels['Label'], test_size=0.4, random_state=42, stratify=labels['Label'])
#x_val_blur, x_test_blur, y_val_blur, y_test_blur = train_test_split(x_temp_blur, y_temp_blur, test_size=0.5, random_state=42, stratify=y_temp_blur)
# splitting laplacian images
#x_train_lap, x_temp_lap, y_train_lap, y_temp_lap = train_test_split(laplacian_images, labels['Label'], test_size=0.4, random_state=42, stratify=labels['Label'])
#x_val_lap, x_test_lap, y_val_lap, y_test_lap = train_test_split(x_temp_lap, y_temp_lap, test_size=0.5, random_state=42, stratify=y_temp_lap)
- We can also try ANN route to experiment with by using different filters (commented as these are out of scope for this assignment)
Data Normalization¶
# label binarizer is not needed as the labels are already in binary format (0 and 1)
# as we observed earlier in data exploration the values are between 0 and 255, lets normalize them
# normalizing rgb images
x_train_normalized_rgb = x_train_rgb.astype('float32') / 255.0
x_val_normalized_rgb = x_val_rgb.astype('float32') / 255.0
x_test_normalized_rgb = x_test_rgb.astype('float32') / 255.0
# normalizing grayscale images
x_train_normalized_gray = x_train_gray.astype('float32') / 255.0
x_val_normalized_gray = x_val_gray.astype('float32') / 255.0
x_test_normalized_gray = x_test_gray.astype('float32') / 255.0
# normalizing blurred images
# x_train_normalized_blur = x_train_blur.astype('float32') / 255.0
# x_val_normalized_blur = x_val_blur.astype('float32') / 255.0
# x_test_normalized_blur = x_test_blur.astype('float32') / 255.0
# normalizing laplacian images
# x_train_normalized_lap = x_train_lap.astype('float32') / 255.0
# x_val_normalized_lap = x_val_lap.astype('float32') / 255.0
# x_test_normalized_lap = x_test_lap.astype('float32') / 255.0
Normalized the data sets to train the models
We can also try ANN instead of CNN using the above converted images (out of scope for this assignment)
we will build all our models now for RGB .
Model Building¶
Model Evaluation Criterion¶
Utility Functions¶
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# checking which probabilities are greater than threshold
pred = model.predict(predictors).reshape(-1)>0.5
target = target.to_numpy().reshape(-1)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred, average='weighted') # to compute Recall
precision = precision_score(target, pred, average='weighted') # to compute Precision
f1 = f1_score(target, pred, average='weighted') # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame({"Accuracy": acc, "Recall": recall, "Precision": precision, "F1 Score": f1,},index=[0],)
return df_perf
def plot_confusion_matrix(model,predictors,target,ml=False):
"""
Function to plot the confusion matrix
model: classifier
predictors: independent variables
target: dependent variable
ml: To specify if the model used is an sklearn ML model or not (True means ML model)
"""
# checking which probabilities are greater than threshold
pred = model.predict(predictors).reshape(-1)>0.5
target = target.to_numpy().reshape(-1)
# Plotting the Confusion Matrix using confusion matrix() function which is also predefined tensorflow module
confusion_matrix = tf.math.confusion_matrix(target,pred)
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(
confusion_matrix,
annot=True,
linewidths=.4,
fmt="d",
square=True,
ax=ax
)
plt.show()
Model 1: Simple Convolutional Neural Network (CNN)¶
# Basic CNN model for helmet detection (using RGB images)
cnn_model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=x_train_rgb.shape[1:]),#200,200,3
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])
cnn_model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
cnn_model.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_4 (Conv2D) │ (None, 198, 198, 32) │ 896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_4 (MaxPooling2D) │ (None, 99, 99, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_5 (Conv2D) │ (None, 97, 97, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_5 (MaxPooling2D) │ (None, 48, 48, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_2 (Flatten) │ (None, 147456) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 64) │ 9,437,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_2 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 1) │ 65 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 9,456,705 (36.07 MB)
Trainable params: 9,456,705 (36.07 MB)
Non-trainable params: 0 (0.00 B)
# Fit the basic CNN model using normalized RGB images
history_basic_cnn = cnn_model.fit(
x_train_normalized_rgb, y_train_rgb,
epochs=20,
batch_size=32,
validation_data=(x_val_normalized_rgb, y_val_rgb),
verbose=2,
shuffle=True
)
Epoch 1/20 12/12 - 4s - 337ms/step - accuracy: 0.6878 - loss: 1.2283 - val_accuracy: 0.9762 - val_loss: 0.1515 Epoch 2/20 12/12 - 3s - 215ms/step - accuracy: 0.9683 - loss: 0.1333 - val_accuracy: 0.9762 - val_loss: 0.0673 Epoch 3/20 12/12 - 3s - 219ms/step - accuracy: 0.9735 - loss: 0.0921 - val_accuracy: 0.9762 - val_loss: 0.0603 Epoch 4/20 12/12 - 3s - 222ms/step - accuracy: 0.9947 - loss: 0.0324 - val_accuracy: 0.9762 - val_loss: 0.0703 Epoch 5/20 12/12 - 3s - 229ms/step - accuracy: 0.9894 - loss: 0.0352 - val_accuracy: 1.0000 - val_loss: 0.0074 Epoch 6/20 12/12 - 3s - 213ms/step - accuracy: 0.9815 - loss: 0.0404 - val_accuracy: 0.9841 - val_loss: 0.0336 Epoch 7/20 12/12 - 2s - 200ms/step - accuracy: 0.9921 - loss: 0.0204 - val_accuracy: 1.0000 - val_loss: 0.0109 Epoch 8/20 12/12 - 3s - 212ms/step - accuracy: 0.9947 - loss: 0.0162 - val_accuracy: 0.9762 - val_loss: 0.0961 Epoch 9/20 12/12 - 2s - 198ms/step - accuracy: 0.9947 - loss: 0.0195 - val_accuracy: 0.9683 - val_loss: 0.0947 Epoch 10/20 12/12 - 6s - 478ms/step - accuracy: 0.9974 - loss: 0.0102 - val_accuracy: 1.0000 - val_loss: 0.0090 Epoch 11/20 12/12 - 4s - 326ms/step - accuracy: 0.9735 - loss: 0.0728 - val_accuracy: 0.9841 - val_loss: 0.0430 Epoch 12/20 12/12 - 3s - 255ms/step - accuracy: 0.9921 - loss: 0.0311 - val_accuracy: 1.0000 - val_loss: 0.0100 Epoch 13/20 12/12 - 6s - 541ms/step - accuracy: 0.9974 - loss: 0.0224 - val_accuracy: 0.9762 - val_loss: 0.0851 Epoch 14/20 12/12 - 3s - 261ms/step - accuracy: 0.9974 - loss: 0.0153 - val_accuracy: 0.9762 - val_loss: 0.0732 Epoch 15/20 12/12 - 2s - 202ms/step - accuracy: 1.0000 - loss: 0.0067 - val_accuracy: 1.0000 - val_loss: 0.0042 Epoch 16/20 12/12 - 3s - 211ms/step - accuracy: 1.0000 - loss: 0.0063 - val_accuracy: 1.0000 - val_loss: 0.0027 Epoch 17/20 12/12 - 2s - 199ms/step - accuracy: 1.0000 - loss: 0.0014 - val_accuracy: 1.0000 - val_loss: 0.0029 Epoch 18/20 12/12 - 2s - 201ms/step - accuracy: 1.0000 - loss: 0.0028 - val_accuracy: 1.0000 - val_loss: 0.0036 Epoch 19/20 12/12 - 2s - 202ms/step - accuracy: 1.0000 - loss: 0.0030 - val_accuracy: 1.0000 - val_loss: 0.0015 Epoch 20/20 12/12 - 2s - 202ms/step - accuracy: 1.0000 - loss: 0.0024 - val_accuracy: 0.9921 - val_loss: 0.0094
def plot_training_history(history):
"""
Function to plot training and validation accuracy and loss
history: History object returned by model.fit()
"""
# Plot training and validation accuracy and loss
plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.tight_layout()
plt.show()
plot_training_history(history_basic_cnn)
performance_test_basic_cnn = model_performance_classification(cnn_model, x_test_normalized_rgb, y_test_rgb)
print("Performance of Basic CNN Model on Test set of RGB Images:")
print(performance_test_basic_cnn)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step Performance of Basic CNN Model on Test set of RGB Images: Accuracy Recall Precision F1 Score 0 0.992126 0.992126 0.992249 0.992126
# Plot confusion matrix for test set predictions
plot_confusion_matrix(cnn_model, x_test_normalized_rgb, y_test_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step
performance_val_basic_cnn = model_performance_classification(cnn_model, x_val_normalized_rgb, y_val_rgb)
print("Performance of Basic CNN Model on val set of RGB Images:")
print(performance_val_basic_cnn)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step Performance of Basic CNN Model on val set of RGB Images: Accuracy Recall Precision F1 Score 0 0.992063 0.992063 0.992186 0.992062
# Plot confusion matrix for val set predictions
plot_confusion_matrix(cnn_model, x_val_normalized_rgb, y_val_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step
performance_train_basic_cnn = model_performance_classification(cnn_model, x_train_normalized_rgb, y_train_rgb)
print("Performance of Basic CNN Model on Training set of RGB Images:")
print(performance_train_basic_cnn)
12/12 ━━━━━━━━━━━━━━━━━━━━ 1s 43ms/step Performance of Basic CNN Model on Training set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
# Plot confusion matrix for val set predictions
plot_confusion_matrix(cnn_model, x_train_normalized_rgb, y_train_rgb, ml=True)
12/12 ━━━━━━━━━━━━━━━━━━━━ 1s 44ms/step
Vizualizing the predictions¶
def plot_sample_predictions_on_val_set(model):
# Visualize predictions on val images
num_samples = 8
plt.figure(figsize=(16, 8))
pred_probs = model.predict(x_val_normalized_rgb[:num_samples])
pred_labels = (pred_probs > 0.5).astype(int).reshape(-1)
for i in range(num_samples):
plt.subplot(2, num_samples//2, i+1)
plt.imshow(x_val_rgb[i])
true_label = y_val_rgb.iloc[i] if hasattr(y_val_rgb, 'iloc') else y_val_rgb[i]
plt.title(f"True: {'Helmet' if true_label==1 else 'No Helmet'}\nPred: {'Helmet' if pred_labels[i]==1 else 'No Helmet'}")
plt.axis('off')
plt.tight_layout()
plt.show()
def plot_sample_predictions_on_test_set(model):
# Visualize predictions on val images
num_samples = 8
plt.figure(figsize=(16, 8))
pred_probs = model.predict(x_test_normalized_rgb[:num_samples])
pred_labels = (pred_probs > 0.5).astype(int).reshape(-1)
for i in range(num_samples):
plt.subplot(2, num_samples//2, i+1)
plt.imshow(x_test_rgb[i])
true_label = y_test_rgb.iloc[i] if hasattr(y_test_rgb, 'iloc') else y_test_rgb[i]
plt.title(f"True: {'Helmet' if true_label==1 else 'No Helmet'}\nPred: {'Helmet' if pred_labels[i]==1 else 'No Helmet'}")
plt.axis('off')
plt.tight_layout()
plt.show()
plot_sample_predictions_on_val_set(cnn_model)
plot_sample_predictions_on_test_set(cnn_model)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 25ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 25ms/step
- A simple CNN peformed very well with the given data
- lets try with a pretrained CNN VGG16 model to see how it works
# lets try the same model with grayscale images
# Basic CNN model for helmet detection (using grayscale images)
cnn_model_gray = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=x_train_gray.shape[1:]),#200,200,1
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])
cnn_model_gray.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
cnn_model_gray.summary()
Model: "sequential_3"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d_6 (Conv2D) │ (None, 198, 198, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_6 (MaxPooling2D) │ (None, 99, 99, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_7 (Conv2D) │ (None, 97, 97, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_7 (MaxPooling2D) │ (None, 48, 48, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_3 (Flatten) │ (None, 147456) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_6 (Dense) │ (None, 64) │ 9,437,248 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_3 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_7 (Dense) │ (None, 1) │ 65 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 9,456,129 (36.07 MB)
Trainable params: 9,456,129 (36.07 MB)
Non-trainable params: 0 (0.00 B)
# Fit the basic CNN model using normalized RGB images
history_basic_cnn_gray = cnn_model_gray.fit(
x_train_normalized_gray, y_train_gray,
epochs=20,
batch_size=32,
validation_data=(x_val_normalized_gray, y_val_gray),
verbose=2,
shuffle=True
)
Epoch 1/20 12/12 - 3s - 272ms/step - accuracy: 0.6402 - loss: 0.7202 - val_accuracy: 0.9444 - val_loss: 0.3066 Epoch 2/20 12/12 - 2s - 187ms/step - accuracy: 0.9339 - loss: 0.1733 - val_accuracy: 0.9762 - val_loss: 0.0882 Epoch 3/20 12/12 - 2s - 186ms/step - accuracy: 0.9868 - loss: 0.0583 - val_accuracy: 0.9921 - val_loss: 0.0166 Epoch 4/20 12/12 - 2s - 187ms/step - accuracy: 0.9894 - loss: 0.0338 - val_accuracy: 0.9841 - val_loss: 0.0312 Epoch 5/20 12/12 - 2s - 189ms/step - accuracy: 1.0000 - loss: 0.0158 - val_accuracy: 0.9841 - val_loss: 0.0533 Epoch 6/20 12/12 - 2s - 187ms/step - accuracy: 0.9974 - loss: 0.0136 - val_accuracy: 0.9921 - val_loss: 0.0075 Epoch 7/20 12/12 - 2s - 185ms/step - accuracy: 0.9974 - loss: 0.0116 - val_accuracy: 0.9921 - val_loss: 0.0193 Epoch 8/20 12/12 - 2s - 191ms/step - accuracy: 1.0000 - loss: 0.0067 - val_accuracy: 0.9841 - val_loss: 0.0533 Epoch 9/20 12/12 - 2s - 187ms/step - accuracy: 0.9974 - loss: 0.0100 - val_accuracy: 1.0000 - val_loss: 0.0022 Epoch 10/20 12/12 - 2s - 195ms/step - accuracy: 1.0000 - loss: 0.0035 - val_accuracy: 0.9921 - val_loss: 0.0294 Epoch 11/20 12/12 - 2s - 184ms/step - accuracy: 0.9974 - loss: 0.0090 - val_accuracy: 1.0000 - val_loss: 0.0021 Epoch 12/20 12/12 - 2s - 187ms/step - accuracy: 1.0000 - loss: 0.0033 - val_accuracy: 1.0000 - val_loss: 0.0038 Epoch 13/20 12/12 - 2s - 184ms/step - accuracy: 1.0000 - loss: 0.0018 - val_accuracy: 0.9921 - val_loss: 0.0174 Epoch 14/20 12/12 - 2s - 183ms/step - accuracy: 0.9974 - loss: 0.0030 - val_accuracy: 1.0000 - val_loss: 0.0011 Epoch 15/20 12/12 - 2s - 182ms/step - accuracy: 1.0000 - loss: 5.5501e-04 - val_accuracy: 0.9921 - val_loss: 0.0088 Epoch 16/20 12/12 - 2s - 187ms/step - accuracy: 1.0000 - loss: 4.6206e-04 - val_accuracy: 0.9921 - val_loss: 0.0160 Epoch 17/20 12/12 - 2s - 184ms/step - accuracy: 1.0000 - loss: 2.8315e-04 - val_accuracy: 0.9921 - val_loss: 0.0183 Epoch 18/20 12/12 - 2s - 194ms/step - accuracy: 1.0000 - loss: 7.0238e-04 - val_accuracy: 0.9921 - val_loss: 0.0068 Epoch 19/20 12/12 - 2s - 184ms/step - accuracy: 1.0000 - loss: 3.7387e-04 - val_accuracy: 0.9921 - val_loss: 0.0083 Epoch 20/20 12/12 - 2s - 196ms/step - accuracy: 0.9974 - loss: 0.0053 - val_accuracy: 0.9921 - val_loss: 0.0375
plot_training_history(history_basic_cnn_gray)
performance_test_basic_cnn_gray = model_performance_classification(cnn_model_gray, x_test_normalized_gray, y_test_gray)
print("Performance of Basic CNN Model on Test set of Grayscale Images:")
print(performance_test_basic_cnn_gray)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step Performance of Basic CNN Model on Test set of Grayscale Images: Accuracy Recall Precision F1 Score 0 0.968504 0.968504 0.968962 0.968492
# Plot confusion matrix for test set predictions
plot_confusion_matrix(cnn_model_gray, x_test_normalized_gray, y_test_gray, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
- The gray scale images also performed very well with the basic cnn model.
- we can continue to test the basic cnn or create an ANN with the gray scale or blurred or lap filters
- but for this exericse lets continue with the vgg-16 models and see how the perforamnce is
Model 2: (VGG-16 (Base))¶
# VGG16-based model for helmet detection (using RGB images)
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.models import Model
# Load VGG16 base (without top, with imagenet weights)
vgg_base = VGG16(weights='imagenet', include_top=False, input_shape=x_train_rgb.shape[1:])
# Making all the layers of the VGG model non-trainable. i.e. freezing them
for layer in vgg_base.layers:
layer.trainable = False
vgg_base.summary()
vgg16_model = Sequential() # Initializing the Sequential model
vgg16_model.add(vgg_base) # Adding the VGG16 base model
vgg16_model.add(Flatten())# Flattening the output of the VGG16 model
vgg16_model.add(Dense(1, activation='sigmoid'))
vgg16_model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
vgg16_model.summary()
Model: "vgg16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ input_layer_4 (InputLayer) │ (None, 200, 200, 3) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_conv1 (Conv2D) │ (None, 200, 200, 64) │ 1,792 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_conv2 (Conv2D) │ (None, 200, 200, 64) │ 36,928 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block1_pool (MaxPooling2D) │ (None, 100, 100, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_conv1 (Conv2D) │ (None, 100, 100, 128) │ 73,856 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_conv2 (Conv2D) │ (None, 100, 100, 128) │ 147,584 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block2_pool (MaxPooling2D) │ (None, 50, 50, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv1 (Conv2D) │ (None, 50, 50, 256) │ 295,168 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv2 (Conv2D) │ (None, 50, 50, 256) │ 590,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_conv3 (Conv2D) │ (None, 50, 50, 256) │ 590,080 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block3_pool (MaxPooling2D) │ (None, 25, 25, 256) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv1 (Conv2D) │ (None, 25, 25, 512) │ 1,180,160 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv2 (Conv2D) │ (None, 25, 25, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_conv3 (Conv2D) │ (None, 25, 25, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block4_pool (MaxPooling2D) │ (None, 12, 12, 512) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv1 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv2 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_conv3 (Conv2D) │ (None, 12, 12, 512) │ 2,359,808 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ block5_pool (MaxPooling2D) │ (None, 6, 6, 512) │ 0 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 14,714,688 (56.13 MB)
Trainable params: 0 (0.00 B)
Non-trainable params: 14,714,688 (56.13 MB)
Model: "sequential_4"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_4 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_8 (Dense) │ (None, 1) │ 18,433 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 14,733,121 (56.20 MB)
Trainable params: 18,433 (72.00 KB)
Non-trainable params: 14,714,688 (56.13 MB)
trainDataGen = ImageDataGenerator()
# Fit the VGG16 model using normalized RGB images
history_vgg16 = vgg16_model.fit(trainDataGen.flow(x_train_normalized_rgb, y_train_rgb, batch_size=32,shuffle=False),
epochs=20,
validation_data=(x_val_normalized_rgb, y_val_rgb),
verbose=2
)
Epoch 1/20 12/12 - 27s - 2s/step - accuracy: 0.8995 - loss: 0.1998 - val_accuracy: 1.0000 - val_loss: 0.0213 Epoch 2/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0081 - val_accuracy: 1.0000 - val_loss: 0.0064 Epoch 3/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0029 - val_accuracy: 1.0000 - val_loss: 0.0046 Epoch 4/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0020 - val_accuracy: 1.0000 - val_loss: 0.0042 Epoch 5/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0016 - val_accuracy: 1.0000 - val_loss: 0.0039 Epoch 6/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0014 - val_accuracy: 1.0000 - val_loss: 0.0038 Epoch 7/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0013 - val_accuracy: 1.0000 - val_loss: 0.0035 Epoch 8/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0012 - val_accuracy: 1.0000 - val_loss: 0.0034 Epoch 9/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0011 - val_accuracy: 1.0000 - val_loss: 0.0033 Epoch 10/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0010 - val_accuracy: 1.0000 - val_loss: 0.0032 Epoch 11/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 9.3872e-04 - val_accuracy: 1.0000 - val_loss: 0.0031 Epoch 12/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 8.6986e-04 - val_accuracy: 1.0000 - val_loss: 0.0030 Epoch 13/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 8.1396e-04 - val_accuracy: 1.0000 - val_loss: 0.0029 Epoch 14/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 7.6172e-04 - val_accuracy: 1.0000 - val_loss: 0.0028 Epoch 15/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 7.1410e-04 - val_accuracy: 1.0000 - val_loss: 0.0027 Epoch 16/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 6.7181e-04 - val_accuracy: 1.0000 - val_loss: 0.0026 Epoch 17/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 6.3435e-04 - val_accuracy: 1.0000 - val_loss: 0.0026 Epoch 18/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 5.9627e-04 - val_accuracy: 1.0000 - val_loss: 0.0025 Epoch 19/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 5.6484e-04 - val_accuracy: 1.0000 - val_loss: 0.0024 Epoch 20/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 5.3532e-04 - val_accuracy: 1.0000 - val_loss: 0.0023
plot_training_history(history_vgg16)
performance_train_basic_vgg = model_performance_classification(vgg16_model, x_train_normalized_rgb, y_train_rgb)
print("Performance of Basic VGG Model on Training set of RGB Images:")
print(performance_train_basic_vgg)
12/12 ━━━━━━━━━━━━━━━━━━━━ 20s 2s/step Performance of Basic VGG Model on Training set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(vgg16_model, x_train_normalized_rgb, y_train_rgb, ml=True)
12/12 ━━━━━━━━━━━━━━━━━━━━ 19s 2s/step
performance_val_basic_vgg = model_performance_classification(vgg16_model, x_val_normalized_rgb, y_val_rgb)
print("Performance of Basic CNN vgg16 Model on Val set of RGB Images:")
print(performance_val_basic_vgg)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step Performance of Basic CNN vgg16 Model on Val set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(vgg16_model, x_val_normalized_rgb, y_val_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 7s 2s/step
# peroformance classification on test set
performance_test_basic_vgg = model_performance_classification(vgg16_model, x_test_normalized_rgb, y_test_rgb)
print("Performance of Basic CNN vgg16 Model on Test set of RGB Images:")
print(performance_test_basic_vgg)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step Performance of Basic CNN vgg16 Model on Test set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
# Plot confusion matrix for test set predictions
plot_confusion_matrix(vgg16_model, x_test_normalized_rgb, y_test_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step
Visualizing the prediction:¶
print("sample visulization of predictions on Validation set ")
plot_sample_predictions_on_val_set(vgg16_model)
print("sample visulization of predictions on test set ")
plot_sample_predictions_on_test_set(vgg16_model)
sample visulization of predictions on Validation set 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 413ms/step
sample visulization of predictions on test set 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step
Observations from VGG16 Model Results¶
The VGG16 transfer learning model achieved strong accuracy on both validation and test sets, indicating good generalization.
Training and validation accuracy curves show minimal overfitting, likely due to frozen base layers and regularization (Dropout).
The confusion matrix reveals that the model distinguishes well between 'Helmet' and 'No Helmet' classes.
VGG and basic CNN performed almost same for this usecase. VGG being pretrained can still be good choice to be ready for future scenarios
Further improvements could be made with adding fully connected layers, data augmentation, fine-tuning, or experimenting with other architectures.
Model 3: (VGG-16 (Base + FFNN))¶
# lets create a feed forward neural network using the extracted features from VGG16 model
vgg16_ffnn_model = Sequential()
vgg16_ffnn_model.add(vgg_base) # Adding the VGG16 base model
vgg16_ffnn_model.add(Flatten())# Flattening the output of the VGG
# Adding fully connected layers
vgg16_ffnn_model.add(Dense(128, activation='relu'))
vgg16_ffnn_model.add(Dropout(0.5))
vgg16_ffnn_model.add(Dense(64, activation='relu'))
vgg16_ffnn_model.add(Dense(1, activation='sigmoid'))
vgg16_ffnn_model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
vgg16_ffnn_model.summary()
Model: "sequential_5"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_5 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_9 (Dense) │ (None, 128) │ 2,359,424 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_4 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_10 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_11 (Dense) │ (None, 1) │ 65 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 17,082,433 (65.16 MB)
Trainable params: 2,367,745 (9.03 MB)
Non-trainable params: 14,714,688 (56.13 MB)
# Fit the vgg16_ffnn_model using normalized RGB images
history_vgg16_ffnn = vgg16_ffnn_model.fit(trainDataGen.flow(x_train_normalized_rgb, y_train_rgb, batch_size=32,shuffle=False),
epochs=20,
validation_data=(x_val_normalized_rgb, y_val_rgb),
verbose=2
)
# re using the trainDataGen defined earlier for default data augmentation
Epoch 1/20 12/12 - 26s - 2s/step - accuracy: 0.8095 - loss: 0.4135 - val_accuracy: 0.9921 - val_loss: 0.0140 Epoch 2/20 12/12 - 25s - 2s/step - accuracy: 0.9974 - loss: 0.0123 - val_accuracy: 1.0000 - val_loss: 6.4031e-04 Epoch 3/20 12/12 - 25s - 2s/step - accuracy: 0.9974 - loss: 0.0073 - val_accuracy: 1.0000 - val_loss: 7.5962e-04 Epoch 4/20 12/12 - 25s - 2s/step - accuracy: 0.9974 - loss: 0.0046 - val_accuracy: 1.0000 - val_loss: 2.8602e-04 Epoch 5/20 12/12 - 25s - 2s/step - accuracy: 0.9974 - loss: 0.0052 - val_accuracy: 1.0000 - val_loss: 2.2116e-04 Epoch 6/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0011 - val_accuracy: 1.0000 - val_loss: 6.7130e-04 Epoch 7/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0012 - val_accuracy: 1.0000 - val_loss: 3.5986e-04 Epoch 8/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 8.2859e-04 - val_accuracy: 1.0000 - val_loss: 6.7191e-04 Epoch 9/20 12/12 - 25s - 2s/step - accuracy: 0.9974 - loss: 0.0030 - val_accuracy: 1.0000 - val_loss: 5.1351e-04 Epoch 10/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 8.2787e-04 - val_accuracy: 1.0000 - val_loss: 1.7627e-04 Epoch 11/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 2.4610e-04 - val_accuracy: 1.0000 - val_loss: 3.7951e-04 Epoch 12/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 2.8586e-04 - val_accuracy: 1.0000 - val_loss: 3.0774e-04 Epoch 13/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 1.6748e-04 - val_accuracy: 1.0000 - val_loss: 2.2281e-04 Epoch 14/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 3.2890e-04 - val_accuracy: 1.0000 - val_loss: 2.7520e-04 Epoch 15/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 1.3805e-04 - val_accuracy: 1.0000 - val_loss: 3.6553e-04 Epoch 16/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 4.5668e-04 - val_accuracy: 1.0000 - val_loss: 2.8049e-04 Epoch 17/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 5.4425e-04 - val_accuracy: 1.0000 - val_loss: 1.5664e-04 Epoch 18/20 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 0.0017 - val_accuracy: 1.0000 - val_loss: 0.0019 Epoch 19/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0011 - val_accuracy: 1.0000 - val_loss: 0.0034 Epoch 20/20 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 2.2428e-04 - val_accuracy: 1.0000 - val_loss: 8.8601e-04
plot_training_history(history_vgg16_ffnn)
# performance classification on validation set
performance_val_vgg16_ffnn = model_performance_classification(vgg16_ffnn_model, x_val_normalized_rgb, y_val_rgb)
print("Performance of VGG16 FFNN Model on Val set of RGB Images:")
print(performance_val_vgg16_ffnn)
4/4 ━━━━━━━━━━━━━━━━━━━━ 7s 2s/step Performance of VGG16 FFNN Model on Val set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
# confusion matrix for validation set
plot_confusion_matrix(vgg16_ffnn_model, x_val_normalized_rgb, y_val_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 7s 2s/step
# performance classification on test set
performance_test_vgg16_ffnn = model_performance_classification(vgg16_ffnn_model, x_test_normalized_rgb, y_test_rgb)
print("Performance of VGG16 FFNN Model on Test set of RGB Images:")
print(performance_test_vgg16_ffnn)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step Performance of VGG16 FFNN Model on Test set of RGB Images: Accuracy Recall Precision F1 Score 0 0.992126 0.992126 0.992249 0.992126
# confusion matrix for test set
plot_confusion_matrix(vgg16_ffnn_model, x_test_normalized_rgb, y_test_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step
performance_train_vgg16_ffnn = model_performance_classification(vgg16_ffnn_model, x_train_normalized_rgb, y_train_rgb)
print("Performance of Basic VGG Model with FFNN on Training set of RGB Images:")
print(performance_train_vgg16_ffnn)
12/12 ━━━━━━━━━━━━━━━━━━━━ 19s 2s/step Performance of Basic VGG Model with FFNN on Training set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
# confusion matrix for train set
plot_confusion_matrix(vgg16_ffnn_model, x_train_normalized_rgb, y_train_rgb, ml=True)
12/12 ━━━━━━━━━━━━━━━━━━━━ 19s 2s/step
Visualizing the predictions¶
print("sample visulization of predictions on Validation set ")
plot_sample_predictions_on_val_set(vgg16_ffnn_model)
print("sample visulization of predictions on test set ")
plot_sample_predictions_on_test_set(vgg16_ffnn_model)
sample visulization of predictions on Validation set 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 422ms/step
sample visulization of predictions on test set 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 411ms/step
Observations¶
- VGG16-Base + FFNN performed well
- This model performance is same as base model, perhaps may be because the size of the data .
- These pretrain models are learning fast as the pictures are clearly distinguishable.
Model 4: (VGG-16 (Base + FFNN + Data Augmentation)¶
In most of the real-world case studies, it is challenging to acquire a large number of images and then train CNNs.
To overcome this problem, one approach we might consider is Data Augmentation.
CNNs have the property of translational invariance, which means they can recognise an object even if its appearance shifts translationally in some way. - Taking this attribute into account, we can augment the images using the techniques listed below
- Horizontal Flip (should be set to True/False)
- Vertical Flip (should be set to True/False)
- Height Shift (should be between 0 and 1)
- Width Shift (should be between 0 and 1)
- Rotation (should be between 0 and 180)
- Shear (should be between 0 and 1)
- Zoom (should be between 0 and 1) etc.
Remember, data augmentation should not be used in the validation/test data set.
# lets create a feed forward neural network using the extracted features from VGG16 model (same as model-3 but for clarity i am defining again)
vgg16_ffnn_da_model = Sequential()
vgg16_ffnn_da_model.add(vgg_base) # Adding the VGG16 base model
vgg16_ffnn_da_model.add(Flatten())# Flattening the output of the VGG
# Adding fully connected layers
vgg16_ffnn_da_model.add(Dense(128, activation='relu'))
vgg16_ffnn_da_model.add(Dropout(0.5))
vgg16_ffnn_da_model.add(Dense(64, activation='relu'))
vgg16_ffnn_da_model.add(Dense(1, activation='sigmoid'))
# the idea here is we will use data augmentation to train the model
vgg16_ffnn_da_model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
vgg16_ffnn_da_model.summary()
Model: "sequential_6"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ vgg16 (Functional) │ (None, 6, 6, 512) │ 14,714,688 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten_6 (Flatten) │ (None, 18432) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_12 (Dense) │ (None, 128) │ 2,359,424 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_5 (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_13 (Dense) │ (None, 64) │ 8,256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_14 (Dense) │ (None, 1) │ 65 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 17,082,433 (65.16 MB)
Trainable params: 2,367,745 (9.03 MB)
Non-trainable params: 14,714,688 (56.13 MB)
from tensorflow.keras.callbacks import EarlyStopping
# since we are going to use data augumentation we will define a new ImageDataGenerator with some data augmentation techniques
dataGenAugmented = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
zoom_range=0.15,
horizontal_flip=True,
fill_mode='nearest'
)
# also lets define a early stopping callback to prevent overfitting
early_stopping = EarlyStopping(
monitor='val_loss', # Monitor validation loss
patience=5, # Stop if no improvement for 5 epochs
mode='min', # Minimize the validation loss
verbose=1,
restore_best_weights=True # Restore best weights found during training
)
# Fit the vgg16_ffnn_da_model using normalized RGB images
history_vgg16_ffnn_da_model = vgg16_ffnn_da_model.fit(trainDataGen.flow(x_train_normalized_rgb, y_train_rgb, batch_size=32,shuffle=False),
epochs=200, # increased epochs since we have early stopping
callbacks=[early_stopping],
validation_data=(x_val_normalized_rgb, y_val_rgb),
verbose=2
)
Epoch 1/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 1.1077e-04 - val_accuracy: 1.0000 - val_loss: 1.4701e-06 Epoch 2/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 2.4386e-05 - val_accuracy: 1.0000 - val_loss: 1.3341e-06 Epoch 3/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 1.7757e-05 - val_accuracy: 1.0000 - val_loss: 1.2750e-06 Epoch 4/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 3.4366e-05 - val_accuracy: 1.0000 - val_loss: 1.2407e-06 Epoch 5/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 1.1623e-05 - val_accuracy: 1.0000 - val_loss: 1.2103e-06 Epoch 6/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 3.0353e-05 - val_accuracy: 1.0000 - val_loss: 1.1957e-06 Epoch 7/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 6.9421e-05 - val_accuracy: 1.0000 - val_loss: 1.2020e-06 Epoch 8/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 6.2499e-05 - val_accuracy: 1.0000 - val_loss: 1.2558e-06 Epoch 9/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 8.4202e-05 - val_accuracy: 1.0000 - val_loss: 1.1266e-06 Epoch 10/200 12/12 - 26s - 2s/step - accuracy: 1.0000 - loss: 9.4916e-05 - val_accuracy: 1.0000 - val_loss: 2.6783e-06 Epoch 11/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 8.8229e-06 - val_accuracy: 1.0000 - val_loss: 5.2944e-06 Epoch 12/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 9.4572e-06 - val_accuracy: 1.0000 - val_loss: 5.8613e-06 Epoch 13/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 0.0011 - val_accuracy: 1.0000 - val_loss: 2.2558e-06 Epoch 14/200 12/12 - 25s - 2s/step - accuracy: 1.0000 - loss: 3.2293e-06 - val_accuracy: 1.0000 - val_loss: 1.5033e-06 Epoch 14: early stopping Restoring model weights from the end of the best epoch: 9.
plot_training_history(history_vgg16_ffnn_da_model)
observation¶
- using early stopping helped to stop the epochs at 14
- the model is now using best weights from 9th iteration
- the model loss graph clearly shows this
# performance classification on validation set
performance_val_vgg16_ffnn_da = model_performance_classification(vgg16_ffnn_da_model, x_val_normalized_rgb, y_val_rgb)
print("Performance of VGG16 FFNN Model with data augumentation on Val set of RGB Images:")
print(performance_val_vgg16_ffnn_da)
4/4 ━━━━━━━━━━━━━━━━━━━━ 7s 2s/step Performance of VGG16 FFNN Model with data augumentation on Val set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
# confusion matrix for validation set
plot_confusion_matrix(vgg16_ffnn_da_model, x_val_normalized_rgb, y_val_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step
# performance classification on Test set
performance_test_vgg16_ffnn_da = model_performance_classification(vgg16_ffnn_da_model, x_test_normalized_rgb, y_test_rgb)
print("Performance of VGG16 FFNN Model with data augumentation on Test set of RGB Images:")
print(performance_test_vgg16_ffnn_da)
4/4 ━━━━━━━━━━━━━━━━━━━━ 6s 2s/step Performance of VGG16 FFNN Model with data augumentation on Test set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
#confusion matrix for test set
plot_confusion_matrix(vgg16_ffnn_da_model, x_test_normalized_rgb, y_test_rgb, ml=True)
4/4 ━━━━━━━━━━━━━━━━━━━━ 11s 3s/step
performance_train_vgg16_ffnn_da = model_performance_classification(vgg16_ffnn_da_model, x_train_normalized_rgb, y_train_rgb)
print("Performance of Basic VGG Model with FFNN and Data Augumentation on Training set of RGB Images:")
print(performance_train_vgg16_ffnn_da)
12/12 ━━━━━━━━━━━━━━━━━━━━ 30s 2s/step Performance of Basic VGG Model with FFNN and Data Augumentation on Training set of RGB Images: Accuracy Recall Precision F1 Score 0 1.0 1.0 1.0 1.0
plot_confusion_matrix(vgg16_ffnn_da_model, x_train_normalized_rgb, y_train_rgb, ml=True)
12/12 ━━━━━━━━━━━━━━━━━━━━ 24s 2s/step
Visualizing the predictions¶
print("sample visulization of predictions on Validation set ")
plot_sample_predictions_on_val_set(vgg16_ffnn_da_model)
print("sample visulization of predictions on test set ")
plot_sample_predictions_on_test_set(vgg16_ffnn_da_model)
sample visulization of predictions on Validation set 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 820ms/step
sample visulization of predictions on test set 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 538ms/step
Observations¶
- This Model also performed well and is on par with previous models in performance
- The training was faster because of the usage of early stopping we provided , even though epcohs are high (stopped at 14 epochs)
Model Performance Comparison and Final Model Selection¶
# load all performance results into a dataframe for comparison
performance_comparison = pd.DataFrame({
'Model': ['Basic CNN RGB', 'VGG16 RGB', 'VGG16 FFNN RGB', 'VGG16 FFNN DA RGB'],
'Train Accuracy': [
performance_train_basic_cnn['Accuracy'].values[0],
performance_train_basic_vgg['Accuracy'].values[0],
performance_train_vgg16_ffnn['Accuracy'].values[0],
performance_train_vgg16_ffnn_da['Accuracy'].values[0]
],
'Val Accuracy': [
performance_val_basic_cnn['Accuracy'].values[0],
performance_val_basic_vgg['Accuracy'].values[0],
performance_val_vgg16_ffnn['Accuracy'].values[0],
performance_val_vgg16_ffnn_da['Accuracy'].values[0]
],
'Test Accuracy': [
performance_test_basic_cnn['Accuracy'].values[0],
performance_test_basic_vgg['Accuracy'].values[0],
performance_test_vgg16_ffnn['Accuracy'].values[0],
performance_test_vgg16_ffnn_da['Accuracy'].values[0]
]
})
# display the performance comparison
print("Performance Comparison of Different Models:")
print(performance_comparison)
Performance Comparison of Different Models:
Model Train Accuracy Val Accuracy Test Accuracy
0 Basic CNN RGB 1.0 0.992063 0.992126
1 VGG16 RGB 1.0 1.000000 1.000000
2 VGG16 FFNN RGB 1.0 1.000000 0.992126
3 VGG16 FFNN DA RGB 1.0 1.000000 1.000000
Test Performance¶
testPerformances = pd.concat([
performance_test_basic_cnn.T,
performance_test_basic_cnn_gray.T,
performance_test_basic_vgg.T,
performance_test_vgg16_ffnn.T,
performance_test_vgg16_ffnn_da.T
], axis=1)
testPerformances.columns = ['Basic CNN RGB', 'Basic CNN Gray','VGG16 RGB', 'VGG16 FFNN RGB', 'VGG16 FFNN DA RGB']
print(testPerformances)
Basic CNN RGB Basic CNN Gray VGG16 RGB VGG16 FFNN RGB \
Accuracy 0.992126 0.968504 1.0 0.992126
Recall 0.992126 0.968504 1.0 0.992126
Precision 0.992249 0.968962 1.0 0.992249
F1 Score 0.992126 0.968492 1.0 0.992126
VGG16 FFNN DA RGB
Accuracy 1.0
Recall 1.0
Precision 1.0
F1 Score 1.0
Actionable Insights & Recommendations¶
Recommendations¶
Deployment: The VGG model with FFNN (feed forward neural networks) and Data Augmentation
VGG16 FFNN DA RGBis suggested for the deployment as we did a trasnfer learning on a pretrained model with data Augmentation.option-2 Deployment: For this use case the base CNN model also performed well, in case of limited computing resources the base CNN model also works well as it is very small and can be run on CPU.
pretrained model is suggested because in future if the realtime images change, the learned model can atleast adapt to the situations
Actionable Insights¶
- We need to collect more images preferbly negative cases of real workers in the field without helmets instead of closeup images and expand the data set to retrain the model with more images.
Other Observations worth to take a note and discuss¶
- High Model Performance: The VGG-16 based models achieved perfect or near-perfect accuracy on the test set. This indicates that the models are highly effective for the given dataset. The features learned by VGG-16 on ImageNet are highly transferable to this problem.
- Transfer Learning: The pre-trained VGG-16 model, even without fine-tuning, performed exceptionally well. This highlights the power of transfer learning for computer vision tasks, especially when the dataset is small.
- Data Quality: The dataset is small, hence all models performed well, including the base CNN without any pretrained model inclusion. The non-helmet images are closeups thats one of the reason the model learned faster
- Data Augmentation: While data augmentation is a standard practice to prevent overfitting and improve generalization, the model without data augmentation already performed perfectly. With early stopping, the augmented data model training stopped very early. This suggests the original dataset might be relatively easy for the model to learn.
Power Ahead!